Search CORE

289 research outputs found

Recommended from our members

Rank-Aware Subspace Clutering for Structured Datasets

Author: Amer-Yahia Sihem
Stoyanovich Julia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

In online applications such as Yahoo! Personals and Trulia.com users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of matches. In addition to filtering, users also specify ranking in their profile, and matches are returned in the form of a ranked list. Top results in ranked lists are typically homogeneous, which hinders data exploration. For example, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing any apartment with different characteristics. An alternative to ranking is to group matches on common attribute values (e.g., cheap 1-bedrooms in good neighborhoods, 2-bedrooms with 2 baths). However, not all groups will be of interest to the user given the ranking criteria. We argue here that neither single-list ranking nor attribute-based grouping is adequate for effective exploration of ranked datasets. We formalize rank-aware clustering and develop a novel rank-aware bottom-up subspace clustering algorithm. We evaluate the performance of our algorithm over large datasets from a leading online dating site, and present an experimental evaluation of its effectiveness

Columbia University Academic Commons

Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

Author: Amer-Yahia Sihem
Kirchgessner Martin
Leroy Vincent
Mishra Shashwat
Publication venue
Publication date: 15/03/2016
Field of study

Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour bread." Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of "interestingness" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare the outcome of interestingness measures applied to buying patterns in the retail industry. We report on how we used CAPA to compare 34 measures applied to over 1,800 stores of Intermarch\'e, one of the largest food retailers in France

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Personalizing XML Full Text Search in PIMENTO

Author: Amer-Yahia Sihem
Fundulaki Irini
Laks Lakshmanan
Publication venue: Dagstuhl Seminar Proceedings. 08111 - Ranked XML Querying
Publication date: 01/01/2008
Field of study

In PIMENTO we advocate a novel approach to XML search that leverages user information to return more relevant query answers. This approach is based on formalizing {em user profiles} in terms of {em scoping rules} which are used to rewrite an input query, and of {em ordering rules} which are combined with query scoring to customize the ranking of query answers to specific users

Dagstuhl Research Online Publication Server

Crowd4U: An Initiative for Constructing an Open Academic Crowdsourcing Network

Author: Amer-Yahia Sihem
Morishima Atsuyuki
Roy Senjuti,
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceWe describe the Crowd4U initiative, which aims at constructing an all-academic open and generic platform for microvolunteering and crowdsourcing worldwide. Crowd4U provides a microtask-based platform in which most workers are volunteers at universities and other research institutions. Crowd4U is open in the sense that the platform can interact with other platforms, researchers can register their tasks, and the underlying code is not a black box. It is generic as it allows to register virtually any task. Crowd4U has already been used by several projects for public and academic purposes

Distributed Evaluation of Top-k Temporal Joins

Author: Amer-Yahia Sihem
Leroy Vincent
Pilourdault Julien
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2016
Field of study

To appear in SIGMOD'16We study a particular kind of join, coined Ranked Temporal Join (RTJ), featuring predicates that compare time intervals and a scoring function associated with each predicate to quantify how well it is satisfied. RTJ queries are prevalent in a variety of applications such as network traffic monitoring , task scheduling, and tweet analysis. RTJ queries are often best interpreted as top-k queries where only the best matches are returned. We show how to exploit the nature of temporal predicates and the properties of their associated scoring semantics to design TKIJ , an efficient query evaluation approach on a distributed Map-Reduce architecture. TKIJ relies on an offline statistics computation that, given a time partitioning into granules, computes the distribution of intervals' endpoints in each granule, and an online computation that generates query-dependent score bounds. Those statistics are used for workload assignment to reducers. This aims at reducing data replication, to limit I/O cost. Additionally , high-scoring results are distributed evenly to enable each reducer to prune unnecessary results. Our extensive experiments on synthetic and real datasets show that TKIJ outperforms state-of-the-art competitors and provides very good performance for n-ary RTJ queries on temporal data

Crossref

Hal - Université Grenoble Alpes

Profile Diversity for Phenotyping Data Search and Recommendation

Author: Amer-Yahia Sihem
Neveu Pascal
Pacitti Esther
Servajean Maximilien
Publication venue: HAL CCSD
Publication date: 22/10/2013
Field of study

Session: Applications innovantesNational audienceDans ce travail, nous étudions la diversité de profils. Il s'agit d'une approche nouvelle dans la recherche de documents scientifiques. De nombreux travaux ont combinés la pertinence des mots clés avec la popularité des documents au sein d'une fonction de score " sociale ". Diversifier le contenu des documents retournés a également été traité de mani'ere approfondie et la recherche, la publicité, les requêtes en base de données et la recommandation. Nous pensons que notre travail est le premier à traiter de la diversité de profils afin de traiter le problème des listes de résultats hautement populaires mais trop ciblées. Nous montrerons comment nous adaptons l'algorithme de Fagin sur les algorithmes à seuil pour retourner les documents les plus pertinents, les plus populaires mais aussi les plus divers que ce soit en terme de contenus ou de profils. Nous avons également un ensemble de simulations sur deux benchmarks afin de valider notre fonction de score

Hal - Université Grenoble Alpes

Exploration of User Groups in VEXUS

Author: Amer-Yahia Sihem
Comba Joao
Moreira Viviane
Omidvar-Tehrani Behrooz
Zegarra Fabian Colque
Publication venue
Publication date: 10/12/2017
Field of study

We introduce VEXUS, an interactive visualization framework for exploring user data to fulfill tasks such as finding a set of experts, forming discussion groups and analyzing collective behaviors. User data is characterized by a combination of demographics like age and occupation, and actions such as rating a movie, writing a paper, following a medical treatment or buying groceries. The ubiquity of user data requires tools that help explorers, be they specialists or novice users, acquire new insights. VEXUS lets explorers interact with user data via visual primitives and builds an exploration profile to recommend the next exploration steps. VEXUS combines state-of-the-art visualization techniques with appropriate indexing of user data to provide fast and relevant exploration

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes